# Ultra-low bit quantization

Holo1 3B GGUF
Other
Holo1-3B is a multimodal model based on the Transformer architecture, focusing on visual document retrieval tasks and performing excellently in the WebVoyager benchmark test, balancing accuracy and cost.
Image-to-Text Transformers English
H
Mungert
583
0
Holo1 7B GGUF
Apache-2.0
The Holo1-7B GGUF model is part of the Surfer-H system and is suitable for multimodal tasks such as visual document retrieval. It is particularly good at web page interaction and network monitoring, and can achieve high accuracy at a low cost.
Image-to-Text Transformers English
H
Mungert
663
0
Qwq 32B ArliAI RpR V4 GGUF
Apache-2.0
A text generation model based on Qwen/QwQ-32B, specializing in role-playing and creative writing tasks, supporting ultra-low bit quantization and long dialogue processing.
Large Language Model Transformers English
Q
Mungert
523
2
Kanana 1.5 8b Instruct 2505 GGUF
Apache-2.0
Kanana 1.5 is the new version of the Kanana model series, with significant improvements in coding, mathematics, and function calling capabilities, capable of processing inputs up to 32K tokens, and up to 128K tokens when using YaRN.
Large Language Model Transformers Supports Multiple Languages
K
Mungert
606
2
Medgemma 4b It GGUF
Other
MedGemma-4B-IT is a medical multimodal model based on Gemma 3, supporting the understanding of medical text and images, and suitable for the development of medical AI applications.
Image-to-Text Transformers
M
Mungert
637
2
Medgemma 27b Text It GGUF
Other
MedGemma-27B-Text-IT is a medical-specific large language model based on the Gemma 3 architecture, optimized for medical text processing and offering multiple quantization versions to adapt to different hardware environments.
Large Language Model Transformers
M
Mungert
1,464
3
Qwenlong L1 32B GGUF
Apache-2.0
QwenLong-L1-32B is a large language model designed for long context reasoning. It is trained through reinforcement learning and performs excellently in multiple long context question answering benchmark tests, capable of effectively handling complex reasoning tasks.
Large Language Model Transformers
Q
Mungert
927
7
Dans PersonalityEngine V1.3.0 24b GGUF
Apache-2.0
Dans-PersonalityEngine-V1.3.0-24b is a multi-functional model series that has been fine-tuned on more than 50 professional datasets and supports multilingual and professional domain tasks.
Large Language Model Transformers
D
Mungert
678
2
Qwen3 30B A6B 16 Extreme GGUF
An ultra-low bit quantization model generated based on Qwen/Qwen3-30B-A3B-Base, supporting a 32k context length and suitable for various hardware environments
Large Language Model Transformers
Q
Mungert
1,321
1
Llama 3.1 Nemotron Nano 4B V1.1 GGUF
Other
Llama-3.1-Nemotron-Nano-4B-v1.1 is a large language model optimized based on Llama 3.1, achieving a good balance between accuracy and efficiency. It is suitable for various scenarios such as AI agents and chatbots.
Large Language Model Transformers English
L
Mungert
2,177
1
Opencodereasoning Nemotron 32B IOI GGUF
Apache-2.0
A large language model based on Qwen2.5-32B-Instruct, post-trained specifically for code generation and reasoning, supporting a 32K context length, suitable for both commercial and non-commercial use.
Large Language Model Transformers
O
Mungert
1,317
2
UI TARS 1.5 7B GGUF
Apache-2.0
UI-TARS-1.5-7B is a multimodal model based on advanced technology, which performs excellently in tasks such as image-text conversion. It adopts an innovative quantization method and can maintain high accuracy at extremely low bit rates.
Text-to-Image Transformers
U
Mungert
2,526
3
Josiefied Qwen3 8B Abliterated V1 GGUF
Quantized version of Qwen3-8B, utilizing IQ-DynamicGate ultra-low bit quantization technology to optimize memory efficiency and inference speed
Large Language Model
J
Mungert
559
1
Phi 4 Mini Reasoning GGUF
MIT
Phi-4-mini-reasoning is a lightweight open model built on synthetic data, focusing on high-quality, reasoning-rich data, and further fine-tuned for more advanced mathematical reasoning capabilities.
Large Language Model Transformers
P
Mungert
3,592
3
Foundation Sec 8B GGUF
Apache-2.0
Foundation-Sec-8B is a language model specifically designed for network security applications. Based on the Llama-3.1 architecture, it has been pre-trained on a large amount of network security-related text data and can understand and process various concepts, terms, and practices in the field of network security.
Large Language Model Transformers English
F
Mungert
7,603
4
Qwen2.5 7B Instruct GGUF
Apache-2.0
Qwen2.5-7B-Instruct is an instruction-tuned model based on Qwen2.5-7B, optimized for text generation tasks, especially in chat scenarios.
Large Language Model English
Q
Mungert
706
4
Olympiccoder 7B GGUF
Apache-2.0
OlympicCoder-7B is a code generation model optimized based on Qwen2.5-Coder-7B-Instruct. It uses the IQ-DynamicGate ultra-low bit quantization technology and is designed for memory-constrained environments.
Large Language Model English
O
Mungert
849
3
Phi 2 GGUF
MIT
phi-2 is a text generation model employing IQ-DynamicGate ultra-low bit quantization (1-2 bits), suitable for natural language processing and code generation tasks.
Large Language Model Supports Multiple Languages
P
Mungert
472
2
GLM Z1 32B 0414 GGUF
MIT
GLM-Z1-32B-0414 is a 32B-parameter multilingual text generation model supporting Chinese and English, released under the MIT license.
Large Language Model Supports Multiple Languages
G
Mungert
994
3
GLM 4 32B 0414 GGUF
MIT
The GLM-4-32B-0414 GGUF model is a series of powerful text generation models with various quantization formats, suitable for different hardware and memory conditions.
Large Language Model Transformers Supports Multiple Languages
G
Mungert
817
4
Llama 3.1 Nemotron 70B Instruct HF GGUF
A model fine-tuned based on Meta Llama-3.1-70B-Instruct, optimized with NVIDIA HelpSteer2 dataset, supporting text generation tasks.
Large Language Model English
L
Mungert
1,434
3
Orpheus 3b 0.1 Ft GGUF
Apache-2.0
An ultra-low bit quantized model optimized based on the Llama-3-8B architecture, utilizing IQ-DynamicGate technology for adaptive 1-2 bit precision quantization, suitable for memory-constrained environments.
Large Language Model English
O
Mungert
1,427
1
Olmo 2 0325 32B Instruct GGUF
Apache-2.0
An instruction-tuned model based on OLMo-2-0325-32B-DPO, utilizing IQ-DynamicGate ultra-low bit quantization technology, optimized for memory-constrained environments.
Large Language Model English
O
Mungert
15.57k
2
Qwen2.5 VL 3B Instruct GGUF
Qwen2.5-VL-3B-Instruct is a 3B-parameter multimodal model supporting image-text generation tasks, specifically optimized for vision capabilities in llama.cpp.
Text-to-Image English
Q
Mungert
10.44k
8
Llama 3.1 Nemotron Nano 8B V1 GGUF
Other
An 8B parameter model based on the Llama-3 architecture, optimized for memory usage with IQ-DynamicGate ultra-low bit quantization technology
Large Language Model English
L
Mungert
2,088
4
Mistral Small 3.1 24B Instruct 2503 GGUF
Apache-2.0
This is an instruction-tuned model based on Mistral-Small-3.1-24B-Base-2503, utilizing GGUF format and IQ-DynamicGate ultra-low bit quantization technology.
Large Language Model Supports Multiple Languages
M
Mungert
10.01k
7
Mistral 7B Instruct V0.2 GGUF
Apache-2.0
Mistral-7B-Instruct-v0.2 is an instruction-tuned model based on the Mistral-7B architecture, supporting text generation tasks, optimized for memory efficiency using IQ-DynamicGate ultra-low bit quantization technology.
Large Language Model
M
Mungert
742
2
Mistral 7B Instruct V0.1 GGUF
Apache-2.0
Mistral-7B-Instruct-v0.1 is a fine-tuned model based on Mistral-7B-v0.1, supporting text generation tasks. It employs IQ-DynamicGate ultra-low bit quantization technology, making it suitable for memory-constrained deployment environments.
Large Language Model
M
Mungert
632
3
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase